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Abstract. We consider the problem of distributed convergence to efficient outcomes in coordination games 
through dynamics based on aspiration learning. Under aspiration learning, a player continues to play an action 
as long as the rewards received exceed a specified aspiration level. Here, the aspiration level is a fading memory 
average of past rewards, and these levels also are subject to occasional random perturbations. A player becomes 
dissatisfied whenever a received reward is less than the aspiration level, in which case the player experiments with a 
probability proportional to the degree of dissatisfaction. Our first contribution is the characterization of the asymptotic 
behavior of the induced Markov chain of the iterated process in terms of an equivalent finite-state Markov chain. We 
then characterize explicitly the behavior of the proposed aspiration learning in a generalized version of coordination 
games, examples of which include network formation and common-pool games. In particular, we show that in generic 
coordination games the frequency at which an efficient action profile is played can be made arbitrarily large. Although 
convergence to efficient outcomes is desirable, in several coordination games, such as common-pool games, attainability 
of fair outcomes, i.e., sequences of plays at which players experience highly rewarding returns with the same frequency, 
might also be of special interest. To this end, we demonstrate through analysis and simulations that aspiration learning 
also establishes fair outcomes in all symmetric coordination games, including common-pool games. 
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1. Introduction. Distributed coordination is of particular interest in many engineering sys- 
tems. Two examples are distributed overlay routing or network formation [S] and medium access 
control 1 1 1 j in wireless communications. In either case, nodes need to utilize their resources effi- 
ciently so that a desirable global objective is achieved. For example, in network formation, nodes 
need to choose their immediate links so that connectivity is achieved with a minimum possible com- 
munication cost, i.e., minimum number of links. Similarly, in medium access control, users need 
to establish a fair scheduling of accessing a shared communication channel so that collisions (i.e., 
situations at which two or more users access the common resource) arc avoided. In these scenarios, 
achieving coordination in a distributed and adaptive fashion to an efficient outcome is of special 
interest. 

The distributed yet coupled nature of these problems, combined with a desire for online adapta- 
tion, motivates using models based on game theoretic learning [8,23, 29 . In game theoretic learning, 
each agent is endowed with a set of actions and a utility /reward function that depends on that 
agent's and other agents' actions. Agents then learn which action to play based only on their own 
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previous experience of the game (actions played and utilities received) . A major challenge in this 
setting is that explicit utility function optimization may be impractical. This may be due to inherent 
complexity (e.g., a large number of players or actions), or the lack of any closed form expression for 
the utility function. Rather, rewards can be measured online. In terms of game theoretic learning, 
this eliminates adaptation based on an ability to compute a "best reply" . Another obstacle to utility 
maximization is that from any agent's perspective, the environment includes other adapting agents, 
and hence is nonstationary. Consequently, actions that may have been effective in the past need not 
continue to be effective. 

Motivated by these issues, this paper considers a form of distributed learning dynamics known 
as aspiration learning, where agents "satisfice" rather than "optimize" . The aspiration learning 
scheme is based on a simple principle of "win-stay, lose-shift" [22] . according to which a successful 
action is repeated while an unsuccessful action is dropped. The success of an action is determined 
by a simple comparison test of its performance with the player's desirable return (aspiration level). 
The aspiration level is updated to incorporate prior experience into the agent's success criterion. 
Through this learning scheme, agents learn to play their "best" action. 

The history of aspiration learning schemes starts with the pioneering work of [25], where satis- 
faction seeking behavior was used to explain social decision making. A simple aspiration learning 
model is presented in [22] , where games of two players and two actions are considered, and decisions 
are taken based on the "win-stay, lose-shift" rule. In the special case of two-player/two- action mutual 
interest games and symmetric coordination games, respectively, references, [21] and [14] show that 
the payoff-dominant action profile is selected with probability close to one. Similar are the results 
in [5][T2]. However, contrary to [21] and [14] . both models incorporate a small perturbation either 
in the aspiration update [13] or in the action update [5]. 

Recent research efforts on equilibrium selection in games have focused on achieving distributed 
convergence to Pareto- efficient payoff profiles, i.e., payoff profiles at which no action change can 
make a player better off while not making some other player worse off. For example, reference [17] 
introduced an aspiration learning algorithm that converges (in distribution) to action profiles that 
maximize social welfare in multiple player games. Some key characteristics of this algorithm is 
that agents keep track of their most recent satisfactory action and satisfactory payoff (benchmark 
action and payoff), and they update their actions by following a "win-stay lose-shift" rule, where the 
aspiration level is defined as the benchmark payoff. Convergence to the Parcto-efficient payoffs in 
two player games also has been investigated by [2] . The learning algorithm considered in [2] has two 
distinctive features: a) agents commit on playing a series of actions for a fc-period interval, and b) 
agents make decisions according to a "win-stay lose-shift" rule, where aspiration levels are computed 
as the running average payoff over all the previous fc-period intervals. It is shown that, in two player 
games, the agents' payoffs converge to a small neighborhood of the set of the Pareto-efficient payoffs 
almost surely if fc is sufficiently large. 

In this paper, we also focus on achieving convergence to efficient payoff profiles (also part of 
the Pareto-efficient payoff profiles) in coordination games of large number of players and actions. 
Agents apply an aspiration learning scheme that is motivated by |13j . Our goal is to a) characterize 
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explicitly the asymptotic behavior of the process for generic games of multiple players and actions, 
and b) derive conditions under which efficient payoffs are selected in large coordination games. Our 
main contribution is the characterization of the asymptotic behavior of the induced Markov chain 
by means of the invariant distributions of an equivalent finite-state Markov chain, whenever the 
experimentation probability becomes sufficiently small. This equivalence simplifies the analysis of 
what would otherwise be an infinite state Markov process. These results extend prior analysis on 
this type of aspiration learning schemes to games of multiple players and actions. We also specialize 
the results for a class of games that is a generalized version of so-called coordination games. In 
particular, we show that, in these games, the unique invariant distribution of the equivalent finite- 
state Markov chain puts arbitrarily large weight on the payoff-dominant action profiles if the step 
size of the aspiration-level update becomes sufficiently small. We finally demonstrate the utility 
of the learning scheme to network formation games, which is of independent interest, since- prior 
learning schemes on network formation are primarily based on best-response dynamics, e.g., [3]. 

While convergence to payoff-dominant action profiles in coordination games is desirable, an- 
other desirable property is a notion of fairness. In particular, for some coordination games where 
coincidence of interests is not so strong, such as the Battle of the Sexes (cf., [201 Section 2.3]), con- 
vergence to a single action profile might not be fair for all agents that would probably rather be 
in a different action profile. Instead, an alternation between several action profiles might be more 
desirable, usually described through distributions in the joint action space. An example of a class 
of such coordination games is so-called common-pool games, where multiple users need to coordi- 
nate on utilizing a limited common resource. The proposed aspiration learning algorithm also may 
provide a distributed and adaptive approach for convergence to fair outcomes in such symmetric 
coordination games, such as common-pool games. This property is of independent interest, since it 
is relevant to several scenarios of distributed resource allocation, such as medium access control in 
wireless communications 

In comparison to prior and other current work, this paper develops (and corrects) the specific 
model of aspiration learning in |13| beyond two player games. The paper goes on to derive special- 
ized results for coordination games involving convergence to efficient action profiles and fairness in 
symmetric games. The results in [17] use a simpler finite state model of aspiration learning and are 
applicable to almost all games. The results in [T7] establish convergence to efficient action profiles, 
but as yet do not specify selection/fairness among these action profiles. The model of [2] is more 
closely related to the present model, but with a different definition of aspiration levels and a differ- 
ent mechanism to perturb aspirations. The results of convergence to efficiency in [5] extend beyond 
coordination games while requiring two player games and do not specify fairness/selection among 
efficient profiles. 

The remainder of the paper is organized as follows. Section [2] defines coordination games 
and presents two special cases of coordination games, namely network formation and common- 
pool games. Section [3] presents the aspiration learning algorithm and its convergence properties in 
games of multiple players and actions. Section [4] specializes the convergence analysis to coordination 
games and establishes convergence to efficient outcomes. It also demonstrates the results through 
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simulations in network formation games. Section [5] extends the convergence analysis to symmetric 
coordination games and establishes conditions under which convergence to fair outcomes is also 
established. Finally, Section [5] presents concluding remarks. 

Terminology: We consider the standard setup of finite strategic-form games. There is a finite 
set of agents or players, X = {1, 2, . . . , n}, and each agent has a finite set of actions, denoted by Ai- 
The set of action profiles is the Cartesian product A = Ai x • • ■ x A n ] on £ A4 denotes an action 
of agent i; and a = (ai, . . . , a n ) £ A denotes the action profile or joint action of all agents. The 
pay off /utility function of player i is a mapping m : A — > R. A strategic-form game, denoted Sf, 
consists of the sets X, A and the preference relation induced by the utility functions Ui, i £ X. An 
action profile a* £ A is a (pure) Nash equilibrium if 

Ui(a*,a*_i) > u,:(a' ; ,alj) (1.1) 

for all i £ X and a' L £ Ai, where —i denotes the complementary set X \ {i}. We denote the set of 
pure Nash equilibria by A*. In case the inequality (|1.1|) is strict, the Nash equilibrium is called a 
strict Nash equilibrium. For the remainder of the paper, the term "Nash equilibrium" always refers 
to a "pure Nash equilibrium." 

2. Coordination Games. 

2.1. Definitions. Before defining coordination games, we first need to define the notion of 
better reply: 

Definition 2.1 (Better reply). The better reply of agent i £ X to an action profile a = 
(a,,a_i) € A is a set valued map BR^ : A — > 2- Ai such that for any a* £ BR^(a) we have 
Ui(a*,a_i) > w 4 (a 4 ,a_i). 

A coordination game is defined as follows: 

Definition 2.2 (Coordination game). A game of two or more agents is a coordination game 
if there exists A C A such that the following conditions are satisfied: 

(a) for any a £ A and a ^ A, 

Ui(a)>Ui(a) for alii el, (2-1) 

i.e., A payoff- dominates A \ A : 

(b) for any a £ A \ (A* U ^4), there exist i £ X and action a[ £ BR^(a) such that 

u j (a' i , a_ 4 ) > Uj(<Xi, a_ 4 ) for all j / i: (2.2) 

(c) for any a* £ A* \ A (if non-empty), there exist an action profile a £ A and a sequence of 
distinct agents ji, . . . ,j n -l £ such that 

Ui (a h ,. . ..<>,. - o' ; , , ; ) < Ui(a*) 

for alii £ {ju32,---,3t+i}> 1= 1,2, ...,n- 1. 
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A strict coordination game refers to a coordination game with the inequality (|2.1[) being strict. 

The conditions of a coordination game establish a weak form of "coincidence of interests" and 
define a larger class of games than the ones traditionally considered as coordination games, e.g., 
[HUE!]. For example, according to [16] . one of the conditions that a coordination game needs to 
satisfy is that payoff differences among players at any action profile are much smaller than payoff 
differences among different action profiles. This condition reflects a form of coincidence of interests. 
Definition 12.21 (b) also establishes a similar form of coincidence of interests, but weaker in the sense 
that it holds for at least one direction of action change. 

Note also that existence of Nash equilibria is not necessary for a game to be a coordination game. 
Furthermore, if A* C A, then Definition 12.21 can be written solely with respect to the desirable set 
of profiles A. In that case, Definition 12.21 (c) becomes vacuous since A* \A = 0. 

A trivial example of a coordination game is the Stag-Hunt Game of Table 12.11 



A B 



4,4 


0,2 


2,0 


3,3 



Table 2.1 
The Stag-Hunt Game 



First, there exists a payoff-dominant profile, namely (^4, A), that can be identified as the desirable 
set A, and satisfies Definition [221 (a)- Also, from any action profile outside A* uA, namely (A, B) or 
(B, A), there is a better reply that improves the payoff for all agents (i.e., Definition 12.21 (b) holds). 
Lastly, for any Nash equilibrium profile outside A, i.e., {B,B), there is a player (row or column) 
and an action which makes everyone worse off (i.e., Definition 12.21 (c) holds). Thus, the Stag-Hunt 
game satisfies all the conditions of Definition 12.21 

Note finally that in some games, there might be multiple choices for the selection of the desirable 
set A. For example, in the Stag- Hunt game of Table 12.11 an alternative selection of A corresponds 
to the union of the action profiles (A, A) and (B,B). In that case, both properties (a) and (b) of 
Definition 12.21 hold, while property (c) is vacuous. In other words, the Stag-Hunt game is also a 
coordination game with respect to the new selection of the desirable set A . 

Claim 2.1. In any coordination game and for any action profile a ^ A* U A there exists a 
sequence of action profiles {ct k }, such that a = a and oq G BRi(a fc_1 ) for some i, terminates at 
an action profile in A* U A . 

Proof. By Definition 12.21 (b) there exists an agent i G I and an action aj G BRi(a°), such that 
u t (a\,aP_^) > u t (a^a^) and u s (a^a^) > u s (a°,a^ !; ) for all s ^ i . Define a 1 = (a,- , o^,-). 
Unless a 1 G A* U A, we can repeat the same argument to generate an action profile a 2 and so 
on. Thus, we construct a sequence (a , a 1 , a 2 , . . . ) along which the map a i-> Xwex Ui(a) is strictly 
monotone. However, since A is finite, the sequence must necessarily terminate at some a k G A* LlA 
for k < \A\. □ 

Note that when A C A* , then a direct consequence of Claim |2~T1 is that coordination games are 
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weakly acyclic games (cf., |29|). 

2.2. Network Formation Games. Network formation games are of particular interest in 
wireless communications due to their utility in modeling distributed topology control |24j and overlay 
routing [BJ. Recent developments in distributed learning dynamics, e.g., [U, have also provided the 
tools for computing efficient solutions for these games in a distributed manner. 

To illustrate how a network formation game can be modeled as a coordination game, we introduce 
a simple network formation game motivated by |12j . Let us consider n nodes deployed on the 
plane and assume that the set of actions of each agent i, At, contains all possible combinations 
of neighbors of i, denoted A/i, with which a link can be established, i.e., A% = 2^*. Links are 
considered unidirectional, and a link established by node i with node s, denoted (s,i), starts at s 
with the arrowhead pointing to i. A graph G is defined as a collection of nodes and directed links. 
Define also a path from s to i as a sequence of nodes and directed links that starts at s and ends to 
i following the orientation of the graph, i.e., 

(s -> i) = {s = s , (so, si), sx, . . . , (s m -i,s m ),s m = i] 

for some positive integer m. In a connected graph, there is a path from any node to any other node. 
Let us consider the utility function itj : A — > M, i £ I, defined by 

Ui(a) = ^2 Xa(s -> i) - c\a>i\ , (2.3) 

s£l\{i} 

where \oti\ denotes the number of links corresponding to on and c is a constant in (0, 1). Also, 

X Q (S ->■ l) = < 

I otherwise, 

where G a denotes the graph induced by joint action a. The resulting Nash equilibria are usually 
called Nash networks [3]. As it was shown in Proposition 4.2 in [4], a network G* is a Nash network 
if and only if it is critically connected, i.e., i) it is connected, and ii) for any (s,i) € G, (s — > i) 
is the unique path from s to i. For example, the resulting Nash networks for n = 3 agents and 
unconstrained neighborhoods are shown in Fig. 12.11 




(a) 



Fig. 2.1. Nash networks in case of n = 3 agents and < c < 1. 

Let us define A to be the following set of action profiles 

A = {a* s A : uAa*) = max itj (a) for all i £ X} , 
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which corresponds to the set of pay off- dominant networks. Note that payoff-dominant networks (if 
they exist) are connected with minimum number of links. Also, not all Nash networks are necessarily 
payoff-dominant. For example, in Fig. I2.1f a). assuming that < c < 1, all players realize the same 
utility, which is equal to 2 — c. This is a strict Nash network since each agent can only be worse 
off by unilaterally changing its links. It is also the payoff-dominant network. On the other hand, 
Fig. I2.1f b) is a non-strict Nash network and is payoff-dominated by Fig. I2.ff a). 

The utility function (|2.3[) corresponds to the connections model of [T2] and has been used to 
describe various economic and social contexts such as transmission of information. It has also 
been applied for distributed topology control in wireless networks |15| . Practically, it constitutes a 
measure of network connectivity, since the maximum utility for node i is achieved when there is a 
path from any other node to i. 

CLAIM 2.2. The network formation game defined by (|2.3j) is a coordination game, provided the 
set of pay off- dominant networks is non-empty. 

Proof. For a joint action a A* suppose that an agent i picks the best reply in BR^ (a) =/= 
(i.e., the most profitable better reply). Then no other agent becomes worse off, since a best reply for 
i always retains connectivity. Note that this is not necessarily true for any other better reply. Thus, 
Definition 12.21 (b) is satisfied. In order to show property (c), consider any joint action a that is a 
Nash network. If any one agent j\ selects the action ctj 1 of establishing "no links" , then there exists 
at least one other agent j 2 =/= ji whose payoff becomes strictly less than the equilibrium payoff (e.g., 
pick j2 such that (ji, J2) € G a ). This is due to the fact that a is critically connected. Continue in 
the same manner by selecting ctj 2 to be the action of establishing "no links" , and so on. This way, 
we may construct a sequence of agents and an action profile which satisfies Definition 12.21 (c) of a 
coordination game. □ 

The condition that payoff-dominant networks exist is not restrictive. For example, if Afi = X\{i} 
for all i, then the set of wheel networks (cf., [4]) is payoff dominant. 

In a forthcoming section, we present a distributed optimization approach for achieving conver- 
gence to payoff-dominant networks through aspiration learning which is of independent interest. 

2.3. Common-Pool Games. Common-pool games refer to strategic interactions where two 
or more agents need to decide unilaterally whether or not to utilize a limited common resource. In 
such interactions, each agent would rather use the common resource by itself than share it with 
another agent, which is usually penalizing for both. 

We define common-pool games as follows: 

Definition 2.3 (Common-Pool Game). A common-pool game is a strategic-form game such 
that for each agent i £ X, A4 = {RbPii • ■ • ,p m -i}i with < po < P\ < ■ ■ ■ < p m —i> an d 



Ui(a) 



I — Cj , if on = pj and en > max^ on , 

— Cj + Tj , if Ui = pj and 3s G X \ {i} s.t. a s > max^ s ct£ , 

—Cj , if on = pj and $s G I s.t. a s > max^s ag , 
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where < Cq < ■ ■ ■ < c m _i < 1, Tj > for all j = 0, 1, . . . , m — 2, and 

-c < -c TO _i + T m _l < • • • < -c + t < 1 - C m _l 



This definition of a common-pool game can be viewed as a finite-action analog of continuous- 
action common-pool games defined in |19) . Table 12.21 presents an example of a common-pool game 
of 2 players and 3 actions. 





PO 




Pi 




P2 


PO 


-co, -c 


-CO + TO, 1 — CI 


-CO 


+ TO , 1 — C2 


pi 


1 - ci, -co + 


TO 


-ci, — CI 


-ci 


+ Tl , 1 — C2 


P2 


1 - C2, -co 4 


TO 


1 — C2, — CI + Tl 




-C2, -C2 



Table 2.2 

A common-pool game of 2 players and 3 actions. 



We call "successful" any action profile in which one player's action is strictly greater than any 
other player's action. Any other situation corresponds to a "failure." In common-pool games, we 
define the set of desirable action profiles A, as the set of successful action profiles, i.e., 

A = ja £ A : 3i £ 1 s.t. oti > max c« j . (2.4) 

For example, this set of joint actions corresponds to the off-diagonal action profiles in Table I2T21 
Moreover, the set A payoff-dominates the set A \ A . 

CLAIM 2.3. Any common-pool game is a strict coordination game. 

Proof. Let A be defined as in (|2.4[) . Note first that for any a* £ A and a £ A \ A, we have 
Ui(a*) > u.i(a) for all i El. In other words, Definition 12.21 (a) is satisfied. 

Moreover, note that any a A is not a Nash equilibrium. For any action profile a A , pick an 
agent i such that i £ argmax se x ct s . Let us also assume that cti = pj for some j £ {0, 1, . . . , m — 1}. 
If j > 0, then agent i can increase its utility by selecting action pk for any k < j. In that case, the 
utility of any other agent either increases or remains the same. If, instead, j = 0, then agent i can 
increase its utility by selecting action pk for any k > j. In this case, the utility of any other agent 
increases. Thus, Definition 12.21 fb) is also satisfied. 

Lastly, note that „4* C A. To check this, consider any a ^ A. As the previous discussion 
revealed, there always exist an agent and a better reply for that agent, i.e., „4* C A. Thus, Defini- 
tion [521(c) is trivially satisfied. □ 

If we imagine that a common-pool game is played repeatedly over time, it would be desirable that 
i) failures are avoided, and ii) agents manage to equally share the time they succeed (i.e., access the 
common resource). In other words, convergence to a successful state may not be sufficient. Instead, 
a (possibly time-dependent) solution that equally divides the time-slots that each user utilizes the 
common resource would seem more appropriate. 

Distributed convergence to such solutions is currently an open issue in packet radio multiple- 
access protocols (see, e.g., Chapter 5]). In these scenarios, there are multiple users that compete 
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for access to a single communication channel. Each user needs to decide whether or not to occupy 
the channel in a given time-slot based only on local information. If more than one user is occupying 
the channel, then a collision occurs and the user needs to resubmit the data. An example of such 
multiple-access protocol is the Aloha protocol [I], where users decide on transmitting a packet 
according to a probabilistic pattern. In this line of work, the action space of each user consists of 
multiple power levels of transmission [26} . If a user transmits with a power level that is strictly 
larger than the power level of any other user, then it is able to transmit successfully, otherwise a 
collision occurs and transmission is not possible. This game can be formulated in a straightforward 
manner as a common-pool game. 

In a forthcoming section we provide a distributed solution to this problem using aspiration 
learning which is of independent interest. 

3. Aspiration Learning. In this section, we define aspiration learning, motivated by |13j. For 
some constants £ > 0, e > 0, A > 0, c > 0, < h < 1, and such that 



the aspiration learning iteration initialized at (a(0),p(0)) is described in Table [3Tl 

According to this algorithm, each agent i keeps track of an aspiration level pi, which mea- 
sures player i's desirable return and is defined as a perturbed fading memory average of its payoffs 
throughout the history of play. 

Given the current aspiration level pi(t), agent i selects a new action cti(t + 1). If the previous 
action cti(t) provided utility at least Pi(t), then the agent is "satisfied" and repeats the same action, 
i.e., a,(t + 1) = oti(t). Otherwise, oti(t + 1) is selected randomly over all available actions, where 
the probability of selecting again a, (t) depends on the level of discontent measured by the difference 
Ui{a(t)) — pi(t) < 0. The random variables {rj(t) : t > 0, i £ 1} are independent, identically 
distributed and are referred to as the "tremble." 

Let X = A x [p,p] n , i.e., pairs of joint actions a and vectors of aspiration levels, pi, i £ I. 
The set A is endowed with the product topology, [p,~p] with its usual Euclidean topology, and 
X with the corresponding product topology. We also let B(X) denote the Borel er-field of X, 
and V{X) the set of probability measures on B(X) endowed with the Prohorov topology i.e., the 
topology of weak convergence. The algorithm in Table 1531 defines an ^-valued Markov chain. Let 
P\ : X x B(X) — > [0, 1] denote its transition probability function, parameterized by A > 0. We refer 
to the process with A > as the perturbed process. 

We let C(X) denote the Banach space of real-valued continuous functions on X under the sup- 
norm (denoted by || ■ || ) topology. For / £ C(X) we define 



It is straightforward to verify that Pa has the Feller property, i.e., P\f £ C(X) for all / £ C(X). 
Recall that p\ £ V(X) is called an invariant probability measure for P\ if 



oo < p < min uAa) < max uAa) < p < oo , 

- aeA s ieI a£A,i£l 




and 
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At every t — 0,1, ... , and for each i G I 

1. Agent i plays a.i(t) and measures utility Ui(a(t)). 

2. Agent i updates its aspiration level according to 

Pi(t + 1) = sat [ Pi (t) + e[ui(a(t)) - Pi (t)} + n(t)] 



where 



and 



, w.p. 1 — A , 

rand[-(,C] , w.p. A, 



sat[p] = < 

3. Agent i updates its action: 



p, if p > p , 
p, if p £ [p, p] , 
p, if p < p . 



ai(t + l) 



a 



i(t) w.p. <j)(ui(a(t)) - pi(t)) , 

rand(.4i \ ai(t)) w.p. 1 - <f)(ui(ot(t)) — pi(t)) 



where 

f 1 , if z > , 

I max(/i, 1 + cz) , if z < . 
4. Agent i updates the time and repeats. 



Table 3.1 
Aspiration Learning 



Since X is a compact metric space and Pa has the Feller property it admits an invariant probability 
measure fi\ [TOJ Theorem 7.2.3]. 

We are interested in the asymptotic behavior of the aspiration learning algorithm as the "ex- 
perimentation probability" A approaches zero. We say that a state x G X is stochastically stable 
if any collection of invariant probability measures {fi\ G V(X) : fi\P\ = p,\ , A > 0} satisfies 
liminfAj.0 ^\{x) > 0. It turns out that the stochastically stable states comprise a finite subset of X 
which is defined next. 

Definition 3.1. A pure strategy state is a state s = (a,p) e X such that for all i G I, 
Ui(a) = pi. The set of pure strategy states is denoted by S and \S\ denotes its cardinality. 
Note that the set S is isomorphic to A and can be identified as such. 

As customary, the Dirac measure in V{X) supported at x G X is denoted by 5 X . The objective in 
this section is to characterize the set of stochastically stable states. Our main result is summarized 
in the following theorem: 

Theorem 3.2. There exists a unique probability vector tt — (jti, 7i"|s|) such that for any 
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collection of invariant probability measures {p\ G V(X) : p\P\ = (1\ , A > 0}, we have 



Um p, x (-) = /}(•) = 7T S 5 S (-) , 



where convergence is in the weak* sense. 

As we show later, ir in Theorem [321 is the unique invariant distribution of a finite-state Markov 
chain. 

Remark 3.1. The expected asymptotic behavior of aspiration learning can be characterized by p 
and, therefore, 7T. In particular, by Birkhoff's individual ergodic theorem, e.g., \1(A Theorem 2.3.4], 
and the weak convergence of p\ to p, the expected percentage of time that the process spends in any 
B G B{X) such that dB PI S 7^ is given by p{B) as the experimentation probability A approaches 
zero and time increases, i.e., 



The proof of Theorem 13 . 2 1 req uires a series of propositions, which comprise the remaining of this 
section. 

Let P{- , •) denote the transition probability function on X x B(X) corresponding to A — 0. We 
refer to the process {X t : t > 0} governed by P as the unperturbed process. Let = X°° denote the 
canonical path space, i.e., an element w G 51 is a sequence {w(0), w(l), . . . }, with oj(t) = (a(t), p(t)) G 
X. We use the same notation for the elements (a, p) of the space X and for the coordinates of the 
process X t = (a(t),p(t)). Let also P x denote the unique probability measure induced by P on the 
product cr-algebra of X°° , initialized at x — (a, p), and the corresponding expectation operator. 
Let also 3i = <j{X T , r < t) , t > 0, denote the cr-algcbra generated by {X T , t < t}. 

For t > define the sets 



Note that {B t : t > 0} is a non- increasing sequence, i.e., B t+ i G B t , while {A t : £ > 0} is non- 
decreasing. Recall that the shift operator 8t : 51 — > SI, t > 0, satisfies X s (9t(u})) = X s +t(o;). 
Therefore A t — 8^ 1 (B 00 ). Let = {J^ A t and B^ = n^i-^t- The set A x is the event that 
agents eventually play the same action profile, while B^ is the event that agents never change their 
actions. For D G B(X) we let x(D) denote the first hitting time of D, i.e., 




A t = {uj efl: a(r) = a(t) , for all r > t} 



B t = {uj eil: a(r) = a(0) , for all < r < t} . 



x(D) = inf {t > : X t G D} . 



(3.1) 



Proposition 3.3. It holds that 



inf P,( J B 00 ) > 



and 



inf P X (A 00 ) = 1. 
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Proof. Assume that the process is initialized at X = x = (a,p). Note that B t consists of those 
sample paths which satisfy 

Pi(r) = Ui(a) - (1 - e) r (u ?; (a) - p) , 0<r<t, i£l. 

Therefore, we have: 

p*(s*)= n n max { /i ' i " c ( i " e ) T (^~^( a )) + }' ( 3 - 2 ) 

o<r<t iei 

where 



(x) 



+ A 



x , if x > , 
, othewise. 



Let T satisfy c(l - e) T ° (p- p) < min {1 - ft,, e} . Then 

p»(Bt) > n n (i-c(i-e) r (p-«,(«)) + ) 

iei T <T<t 

i£X V r=T n + l / 

>^oTT fl- ( l- e ) (P>-^) + ) w>To; 

and since the sequence {B t } is non- increasing, also for all t > 0. Therefore, by continuity from 
above, we obtain inf^* Px(-Boo) > e n h nTo , which proves the first claim. 
Next, define the set 

D e ^ {(a, p) £ X : Pl - u t (a) < (1 - e) e (p - p) , WgI}, £>0, 

and note that F x (Be) < P e (x,D^), where P*, t > 0, denotes the multistage transition probability 
function defined by the recursion P* = P* _1 P and P° = 7. Thus, using the Markov property over 
k time blocks of length £, we obtain the rough estimate 

¥ x (r(D e ) > kl) < F x (X jt eD c e , j = l,...,k) 

< P x (X je e Di , j = 1, . . . , k - 1) ( sup P e (z, D\ 



< ( 1 - inf P 2 (P £ ) ) P,(A> G Dl , j = 1, . . . , k - 1) . (3.3) 



Let (?o — 1 — hifzeA" Pz(Poo)- We have already shown that go < 1- Finite induction on (|3.3[) yields 

V x (T(Dt) > kl) < (l - inf F z (Bt)) < g§ . 
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Wc have 



kC 



■:{A M ) > ^2v x (x(D t ) = t, X o 9 t £ B c 



and thus using the Markov property together with the fact that X T m t ) £ D( a.s. on {x(D() < oo}, 
and setting k = £, we obtain 



> VP^TfD,) = t) inf P^Boo) 



>(l- F x (x(D t ) > £ 2 )) inf P y (Soo) 

\ / i/G-Df 



> (1-gg) inf P^Boo) 
yED t 



(3.4) 



It is clear by (|3.2p that inf xe x>« P^ (-Boo) -> 1 as ^ — > oo. Therefore both terms on the right hand 
side of (|3.4p converge to 1 as I — s- oo, and the proof is complete. □ 

Proposition 3.4. There exists a transition probability function II on X x V(X) that has the 
Feller property and Tl(x, ■) is supported on S for all x £ X , and such that 

(i) For all f £ C{X), lim^ ||P*/ - H/IL - 0. 

(ii) If R\ is a resolvent of P, defined by 

oo 

flA^(A)£(l 
t=o 

where </j(A) £ (0, 1), A > 0, and limA-j-o = 0, t/ien 

hm ||i? A /- 11/11^ = V/eC(*). 

A — )-U 

Proof. For / £ C(#) and x£X,we have E K [/(X t )] = P t f{x). Since A 4 = ^(B^), then using 
the Markov property we obtain that, for any positive t and t', 

\P 2t f{x) - P 2t+t 'f(x)\ = \E x [f(X 2t ) - f(X 2t+t ,)] | 

= |E* [(f(X 2t ) - f{X 2t+t ,))\ At ] | + [(/(Xa) - /(X 2t+t 0)l^] | 



< E ; 

< EL 



E[(/(X 2t )-/(X 2t+t ,))U |%]] +2P^)||/|| 
E Xt [\f(X 2t ) - f{X 2t+v )\ l At ]\ + 2P x (A t c )||/]| 00 



< sup E z [\f(X t ] - f{X t+v )\ l Bx ] + 2P I (A t c )||/|| 00 • 
Since for any initial condition x — (a, p) the dynamics on B^ evolve according to 

p(t) = g(t; a, p) = u{a) - (1 - ef (u(a) - p) , 
13 



(3.5) 



the continuity of / (which is necessarily uniform since X is compact) yields 



sup sup E (a>p) [\f(X t ) - f(X t+t ,)\ l Beo ] 

t'>Q {a,p)eX 



sup sup \f(a, g(t;a,p)) - f(a, g(t + t';a,p))\ >0. (3.6) 

t'>0 (a,o)eX 



*'>0 (a,p)£X 

By (|3.5[) — ()3.6p and Proposition 13.31 we obtain 

sup \\P 2t f -P 2t+t ' fW^- >0. 

t'>o t - >tx> 

Therefore, the sequence {P* f , t <E N} is Cauchy in (C(X), \\ ■ W^), and hence converges in C(X). 
Let <p(f)(x) = limj^oo P t f(x). Then for each x, f >-> (p(f)(x) defines a bounded linear functional 
on C(X). It is a positive functional since (p(f)(x) > 0, for / > 0, and if 1 denotes the constant 
function equal to 1, ip(l)(x) = 1. Then, by the Riesz representation theorem, ip(f)(x) is a Borcl 
probability measure on X for each x. Denote this by II(x, •). Since 93 : C(X) — > C(X), it follows that 
IT has the Feller property. Also, by the definition of II, we have 

ll^/-n/|| >0 VfeC(X). (3.7) 

t — ► oc 

This proves (i). 

Next using a triangle inequality, we have for each T > 0, 



\\Rxf n/i^ < 93(A) Y, (1 - <pWY ll-P*/ - n/lL + (1 - ^(\)) T su P ||p*/ - n/|| c 

— f >T 

t=0 - 

Letting A 4- 0, we obtain 

\\Rxf - n/|U < sup HP'/ - n/iu vr > , 



and (ii) follows by ([3"7F]) . □ 

We can decompose the transition probability function of the perturbed process as 

Px = 0.-<p(X))P + <p{X)Q x , <p(\) ^l-(l-A)", (3.8) 

where 93(A) is the probability that at least one agent trembles, and satisfies 93(A) 4- as A 4- 0. Also, 
define the "lifted" transition probability function: 

00 

Px = ¥>(A) £(1 - ¥>(A))*QaP' = QaPa , 



where Pa was defined in Proposition 13.41 (the equality on the right-hand side is evident by Fubini). 
Similarly we decompose Q\ as 

Q A = (1-V(A))Q + V(A)Q* ! V(A) ^ 1 - ff^Sr 
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Here Q is the transition probability function induced by aspiration learning where exactly one 
player trembles, and Q* is the transition probability function where at least two players tremble 
simultaneously. 

We have the following proposition. 

Proposition 3.5. The following hold, 

(i) For f g C(X), lim A ^ \\P^f - QTlfW^ = 0. 

(ii) Any invariant distribution ii\ of P\ is also an invariant distribution of P^ . 

(hi) Any weak limit point in V{X) of as A 4- 0, is an invariant probability measure of QH. 
Proof, (i) We have 

\\Ptf - TOIL < \\Qx(Rxf - nf)\L + IIQaII/ - Qn/iu 

< \\Rx.f - n/IL + WQxnf - QUfW^ . (3.9) 

The first term on the right hand side of (|3.9p tends to as A l by Proposition ^. 41 while the second 
term does the same by the definition of Q\. 

(ii) Multiplying both sides of (|3.8[) by R\, we have 

P X R X = Rx~ <p(\)I + <^(A)P A L , (3.10) 

where / denotes the identity operator. Let li\ denote an invariant distribution of P\. Hence, by 
(|3~TU|) . we have 

ll x R\ = n\R\ - p(X)nx + ^(A)^a-Pa > 

and the second claim follows. 

(iii) Let ft be a limit point of fi\ as A J, 0. For any / 6 C(X), we have 

£[/] - (aqh)[/] = (am -ma[/]) +MA[P A L /-Qn/] + ( MA [gn/] -A[Qn/]) . 

The first and the third terms on the right hand side tend to as A i along some sequence, by 
the weak convergence li\ to fi, while the second term is dominated by H-P^/] — Qn[/] that also 
tends to by part (i). □ 

For s <E S let N e (s) denote the open e-neighborhood of s in X . For any two pure strategy states, 
s, s' G S, define 

P ss > 4 lim QP'(s,iV E (s')) 

for some e > sufficiently small. By Proposition l3.31 P ss / is independent of the selection of e. Define 
also the |5| x \S\ stochastic matrix P = [P ss >]. 

Proposition 3.6. There exists a unique invariant probability measure fx of QH. It satisfies 

A(-) = E 7r -*-0 ( 3 - n ) 
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for some constants ir s > 0, s € S. Moreover, ir — (iTi, ■ ■ ■ ,tv\s\) is an invariant distribution of P, 
i.e., 7r = ttP. 

Proof. By Proposition 13. 4[ the support of II is S, and so is the support of QH. Thus, for any 
sufficiently small e > 0, QH(s, s') = QH(s, N e (s')) . Since QII is a Feller transition function it admits 
an invariant probability measure, say fi. The support of fi is also S, and, therefore, it has the form 
of (|3.11|) for some constants ir s > 0, s € S. 

Note also that N e (s') is a continuity set of QH(s, •), i.e., QH(s, dN e (s')) = 0. Therefore, by the 
Portmanteau theorem, 

QH(a,N e (s')) = lim gP*(s,iV £ ( s ')) = P ss < ■ 
If we also define ir s = jl(N E (s)), then 

n s , = £(iV e (s')) = tt s QIL(s, N £ (s')) = ^ tv s P 3S , , 

which shows that 7r is an invariant distribution of P, i.e., n = ttP. 

To establish the uniqueness of the invariant distribution of QH, recall the definition of Q. Since 
S is isomorphic with A, we can identify s£5 with an element a £ A. If agent i trembles, then 
all actions in A% have positive probability of being selected, i.e., Q(a, (a' i} ct-i)) > for all a[ 6 A; 
and i 6l. It follows by Proposition 13.31 that QII(a, (c^, a_i)) > for all o! i € -4i and i gl. Finite 
induction then shows that (QII) TI (a, a') > for all a, a' G A It follows that if we restrict the 
domain of QII to S, then QII defines an irreducible stochastic matrix. Therefore, QH has a unique 
invariant distribution. □ 

Theorem 13.21 follows from Propositions 13.51 and 13.61 Moreover, Proposition 13.61 shows that the 
unique invariant probability measure of QII agrees with the unique invariant probability distribution 
of the finite stochastic matrix P. 

Remark 3.2. A similar result to Provosition \3.5V i). based on which Theorem \3.2\ was shown, 
has also been derived in U31 Theorem 2]. The result in though assumes incorrectly that the 
process Q satisfies the strong Feller property. Note that the proof of Proposition \3.5\ does not make 
use of any such assumption and provides a corrected analysis for the asymptotic behavior of the 
aspiration learning scheme presented in JlSjj . 

In the forthcoming sections, we demonstrate the importance of Theorem 13.21 in characterizing 
the asymptotic behavior of aspiration learning in large coordination games. Note that prior analysis 
of this type of aspiration learning, e.g., in [5J[T3], was only restricted to two player and two action 
games. 

4. Efficiency in Coordination Games. In this section, we study the asymptotic behavior 
of the invariant distribution tt of P in strict coordination games when the step size e approaches 
zero. The aim is to characterize the states in S that are stochastically stable with respect to the 
parameter e. To this end, first denote S as the set of pure strategy states that correspond to A. 
Clearly, S is isomorphic to A. Also, denote by S* the set of pure strategy states that correspond to 
the set of Nash action profiles A* ■ 
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Wc define two constants that are important in the analysis: 

A min = min min {Wet) - Ui(a')\ 

A max = max max |tti(a') — Uj(o;)| . 

For strict coordination games A m ; n > 0, and it is the smallest possible payoff decrease from the 
dominant payoff due to any deviation from the set of actions in A. 

To facilitate the analysis we let and denote the probability and expectation operator, 
respectively, on the path space of a Markov process X t starting at x G X at t = 0, and governed by 
the family of transition probabilities {QP l : t > 0}. In other words W x (Xt G A) = QP t ~ 1 (x, A) for 
any A G B{X). 

4.1. Two Technical Lemmas. Lemma [4.11 below introduces two new hypotheses. The first 
hypothesis corresponds to the case at which payoff differences within the same action profile are 
smaller than payoff differences between dominant and non-dominant action profiles. The second 
hypothesis corresponds to the case where each player receives a unique payoff within A. 

Lemma 4.1. Let Sf be a strict coordination game satisfying either one of the following two 
hypotheses: 

(HI) 8* = maxijtj max QS ^ \ui(a) - Uj(a)\ < A min . 
(H2) A = {a G A : itj(a) = max ae _4 Ui(a) Vi G X} . 
Then, there exists a constant Cq = Cq(S* , A m i„, A max ) such that if C < Co then 

P Ss — > for all 5 G <S , s G S\S . 

Proof. Suppose (HI) holds. Select C < ^(A min - 5*). Let x(0) = s= (a,p) G S. Without loss 
of generality suppose agent 1 trembles. If ri(0) < the process clearly converges to s as t — > oo 
with probability 1. Therefore, suppose fi(0) > 0. Note that for t > we have 

\ Pi (t + 1) - Pj (t + 1)| < (1 - e)\ Pi {t) - Pj (t)\ + e\ Ui (a(t)) - Uj (a(t))\ 

<(l-e)\ Pi (t)- Pj (t)\+e8* for all (4.1) 

and since £ < ^(A m j n — 5*) by a straightforward induction argument using (|4.1[) we obtain 

max\ Pt (t)~ Pj (t)\< Amin + S * Vt>0. (4.2) 

For i 6l define 

P i = min Ui(a) and pi = max Ui(a) , 

aeA aeA\A 

and for k = 0, 1 define the sets 

n * ft h + ( 2fc + l )h i Ami " + ^* fc r\ 
D fc = |(o,p)6Af : Pi < + , i el J . 
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Let also 



r = |(a,/o) e X ; min(pi - pi,p { ~ pi) > i(A„ 



6*), i 61 , 



and 



f 4 {( a ,p) er :tt ei}. 

Recall the definition of t in (|3.1j) and in order to simplify the notation let Tfc = x(-Dfe), for fc = 0, 1. 
Note the following: Firstly, using (|4.2[) . we obtain 

(4.3) 



rcfl„\Di 

Secondly, since + 1) — Pi(t)\ < eA max , we obtain 
It is also evident that 



Tl ~ To ~ J7a — 1 {^<cx,} > 



{limsup d s (X t ,S\S) = o} C {ti < oo} 



(4.4) 



(4.5) 



where ds is a metric in S. It is clear from the definition of P that if x £ V there are two possibilities: 
If a profile a G A \ A is played, then pi decreases in value for all i € I, or in other words, that 
P(x, r) = 1 for all x & (F fl -DJ) \ f . Otherwise, if a profile in .4 is played, then the sample path gets 
trapped in the domain of attraction of S. This means that if x £ T then ¥ x (xi < oo) = 0, where 
P x is the probability measure induced by P defined in Section [3j In this case, and by ()4.3|) . we also 
have 

P(x,f) >min {|(A min -<5*),l-/i} =7 Vx e T n . 

Thus, using the Markov property we obtain, with to — 4^a°'" 

P to (x,T\T) < (1 -7)*° Va;ernL>J. (4.6) 
Conditioning on and using the strong Markov property, (|4.4|) , (|4.6|) and the foregoing, we obtain 



< 00) < 



U 1 



{ti<oo} I «1 



<E s [P Xto (ti <oo)] 

< sup P x (ti < 00) 

< sup p to (x,r\f) 

xernDi 



< exp 



A m i n 

. 46A max 
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log(l - 7) 



(4.7) 



The result then follows by ([33]) and (gTTJ). 

- - - A 2 

Next, suppose (H2) holds. Note that in this case pi = Ui(a) for all a £ A. Pick any £ < 4A mi " ■ 
As before we may suppose that agent 1 trembles. Let N*(e) = |_V eA °>»»J • Let T be the first time 
that an action profile in A \ A has been played at least N*(e) times. Then, at time f the aspiration 
level of the initially perturbed agent 1 satisfies: 

Pi(*) <Pi+C- eA min N*(e) < p x , 
while the aspiration level of any agent i £ X satisfies 

c 



Pi{%) > Pi - eA 
For k = 0, 1 define the sets 



£ A min 



> Pi - eA max — - — > p t 



I \ r- v p t + {2k+ l)pi . 
D k = \ {a,p) £ X : pi < 2k + 2 ' * ' ' 



and let ffc = T(Dfe), for k = 0, 1. Also define 

f 4/(a,p)6#: ft </Si- A*!_, l£ l 

It is straightforward to show that ¥ S (X^ £ T) = 1. From this point on, we proceed as in the 
previous case. □ 

For the lemma that follows we need to define the following constant. For each a* £ A* \ A, 
select any a £ A and . . . ,j n -i} C T which satisfy Definition 12.21 (c). and define 

An = - ruin min min [uAa*) — m ( a^, , . . . , a^.a* r„- „■ \ )\ . 



£< A AA min _ (4g) 



By Definition E21(c), A > 0. 
Lemma 4.2. Suppose 



Then, for any strict coordination game <£ for which A* \ A ^ 0, there exists a constant Mq = 
M (h, \A\) > such that 

Ps's > , t ,° 7 \ for alls* £S*\S, s£S. 
c£ A (1 — h) 



Proof. Let s* = (a*,p*) £ S* \ S, s = (a,p) £ S. Suppose a £ A and {ji, . . . , j„-i} C I are 
the action profile and sequence of agents, respectively, corresponding to a* used in the calculation 
of An. Consider the sample paths s(t) = (a(t),p(t)*) satisfying s(0) = s* , pj 1 (l) £ {p^tP^ + C)j 
P-j t (1) = p*_j i , and a(t) = \ 5tj 1 ,...,aij t , a*_ ^ , for < i < n. We have 
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By (pj~5]l . p* - pi(t) < A for all i £ 1 and t < n. Therefore, 

Pi(t) - Ui(a(t)) > A for all i e {ji, . . -,jt+i} , 
for < t < n and hence we obtain 

P( s (t-l), 3 (t))>h n - ^ cAo . A . {1 ~ h)) , Kt<n, (4.10) 



and 



By l|4.8p . we have 



^ (cA A(l-h)) n 
P(s(n-l),s) > y K — ■ (4.11) 



Pi - Pi{n) > A min + p* - Pi {n) > V* £ 1 ■ (4.12) 
By ([4~T2"T) . IL(s(n - 1), s) > P(s(n - 1), s). Consequently, the result follows by (|4~9| - (|4~TT| . □ 
4.2. Main Result. We define inductively the following collection of sets 

!fc-i 
s = (a,p) £ (J {Sj) c : 3i el, a- e BR t (a) satisfying ([221) and (a-,a_i) e S fc - 
i=o 

for iSo = 5* U 5. For example, Si includes all pure strategy states for which there exist an agent 
i and an action a[ e BR; (a) which satisfies (|2.2p (i.e., makes no other player worse off) and also 
a 1 = (a^ct-i) £ Sq. Let also X denote the maximum k for which S& is non-empty, i.e., if 
max e N : 4 ^ 0} . Such if is well-defined since the set of action profiles A is finite. 

Lemma 4.3. In any coordination game, the collection of sets {Sk}^ = o forms a partition of S. 

Proof. By definition of the collection {Sfc}^L , the sets Sk are mutually disjoint. It remains 
to show that their union coincides with S. Assume not, i.e., assume that there exists s £ S such 
that s = (a, p) Ufc=i According to the definition of a coordination game and Claim |2~T| there 
exists a sequence of action profiles {a-'}, such that a = a and a? = BR;(a J ~ 1 ) for some i 6 I 
terminates in .4* U A. Let {s 3 } denote the sequence of pure strategy states which corresponds to 
{a 3 }. Then, for some j* we have s J e 5* U 5, i.e., s° £ So- Since s 3 e So, then we should also 
have that s J _1 e Si, . . . , s = s £ Sj. . However, this conclusion contradicts our assumption that 
s Ufe=i Sfc- Thus, Ufc=i = *-* an( ^ therefore the collection of sets {Sfc}j[L defines a partition for 
S. □ 

Theorem 4.4. Let & be a strict coordination game that satisfies either one of the hypotheses 
(HI) or (H2) in Lemma \4-1\ and suppose that £ < Co- Then % Si — > as e 1 for all s, ^ S. 

Proof. Consider the partition of S defined by the family of sets {Sfc}^L . Let PstSj denote the 
sub-stochastic matrix composed of the transition probabilities P SiSj for Si £ Si and Sj £ Sj. In 
other words [-PsjSjl is the block decomposition of P subordinate to the partition {So, Si . . . ,Sk}- 
Similarly, we define S* = S* \ S, and let 

\Ps*s Ps*s*j 
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denote the block decomposition of Ps S subordinate to the partition (<S, <S*) of So- From: 
we obtain 

By Lemma T4.ll Pss c ~ > as e — > 0, while by Lemma T4.2I for some positive constant S, which does 
not depend on e, we have Pg.^l > SI. Thus, 

Sn s ,l < ns.P^s 1 < ^shs" 1 = 7r s" p S"S 1 > 

and we obtain 

Trj.-J-O ase^O. (4.13) 

Similarly, from the equation ir So = ns Ps S + ^S§Ps§S i we obtain 7rs P,s sgl = 7r 5 cP 5 c 5o l. It is 
straightforward to show, using Definition 12.21 (b), that for some positive constant <5, which does not 
depend on e, we have Ps k g k+1 l > Si for all k > 0. Combining the equations above we get: 

Sn So l < n So P SoSl l < tt So Ps s^ 

= ^S-PsSo 1 + ^S* ^S*S 1 ^ . 

where in the last line we used Lemma |4~T1 and (|4.13|) . Thus, we have shown that irs a — > as e — >• 0. 
We proceed by induction. Suppose irs^ ~ ► as e — > 0. Then, 

^7T5 fc + 1 l < 7T5 fc + lJ P5 fc + 1 5 fc l < 7T Sfe l ^> 0, 

which shows that Tts k+1 — > as e — > 0. By Lemma \A. 31 the proof is complete. □ 

Theorem 14.41 combined with Theorem 13.21 provides a complete characterization of the time av- 
erage asymptotic behavior of aspiration learning in strict coordination games. 

4.3. Simulations in Network Formation Games. In this section, we demonstrate the 
asymptotic behavior of aspiration learning in coordination games as described by Theorems 13.21 [4~4l 
Consider the network formation game of Section |2"T21 which, according to Claim |2~21 is a (non-strict) 
coordination game. Although Theorem 14.41 was only shown for strict coordination games, our in- 
tention here is to demonstrate that it also applies to the larger class of (non-strict) coordination 
games. 

We consider a set of six nodes deployed on the plane, so that the neighbors of each node are the 
two immediate nodes (e.g., Af\ = {2,6}). Note that a payoff-dominant set of networks exists and 
corresponds to the wheel networks, where each node has a single link. We pick the set A of desirable 
networks as the set of wheel networks. Note that the set A satisfies hypothesis (H2) of Lemma I4TT1 
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Fig. 4.1. A typical response of aspiration learning in the network formation game. 

In order for the average behavior to be observed A and e need to be sufficiently small. We 
choose: h = 0.01, c = 0.2, C = 0.01, e = A = lOe - 4, and c = 1 /s. In Fig. QTTJ we have plotted a 
typical response of aspiration learning for this setup, where the final graph and the aspiration level 
as a function of time are shown. 

To illustrate better the response of aspiration learning, define the distance from node j to node 
i, denoted distG(j, i), as the minimum number of hops from j to i. We also adopt the convention 
distc(*, i) = and distc(j, i) = oo if there is no path from j to i in G. The last graph in Fig. 14.11 
plots, for each node, the running average of the inverse total distance from all other nodes, i.e., 
VSjez dist G (j,i). This number is zero if the node is disconnected from any other node. 

We observe that the payoff-dominant profile (wheel network) is played with frequency that 
approaches one. In fact, the aspiration level converges to (n — 1) — c = 4.875 and the inverse total 
distance converges to 1 /i5 ~ 0.067, both of which correspond to the wheel network. 

5. Fairness in Symmetric and Coordination Games. In several coordination games, es- 
tablishing convergence (in the way defined by Theorem 13.21) to the set of desirable states S (as 
Theorem 14.41 showed) may not be sufficient. For example, in common-pool games of Section 12. 31 
convergence to S does not guarantee that all agents get access to the common resource in a fair 
schedule. In the remainder of this section, we establish conditions under which fairness is also 
established. 
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5.1. A Property of Finite Markov Chains. In this section, we provide an approach on 
characterizing explicitly the invariant distribution of a finite-state, irreducible and aperiodic Markov 
chain. We use a characterization introduced by |7J, which has been extensively used for showing 
stochastic stability arguments for several learning dynamics, see, e.g., [TH1I28] . In particular, for finite 
Markov chains an invariant distribution can be expressed as the ratio of sums of products consisting 
of transition probabilities. These products can be described conveniently by means of graphs on the 
set of states of the chain. 

Let S be a finite set of states, whose elements are denoted by Sk, st, etc., and let a subset W of 

S. 

Definition 5.1. (W-graph) A graph consisting of arrows Sk — > sg (sk £5 \ W, st £ S,se ^ Sk) 
is called a W-graph if it satisfies the following conditions: 

1. every point k G S \ W is the initial point of exactly one arrow; 

2. there are no closed cycles in the graph; or, equivalently, for any point s^ G S \ W there 
exists a sequence of arrows leading from it to some point S£ G W. 

We denote by £{W} the set of W-graphs; we shall use the letter g to denote graphs. 
If P Sk s e are nonnegative numbers, where Sk, S£ G 5, define the product 

™(g)= n ^ s " s ' ■ 

{s k ^>s e )eg 

The following Lemma holds: 

Lemma 5.2 (Lemma 6.3.1 in [7|). Let us consider a Markov chain with a finite set of states S 
and transition probabilities {P Sk s e } and assume that every state can be reached from any other state 
in a finite number of steps. Then the stationary distribution of the chain is ir = [ir s ], where 

s G 6 



and R s =J2 g eg{s} w (9)- 

5.2. Fairness in Symmetric Games. In this section, using Theorem 13. 21 and Lemma [5~2l we 
establish fairness in symmetric games, defined as follows: 

Definition 5.3 (Symmetric game). A game characterized by the action profile set A is 
symmetric if for any two agents i,j G I and any action profile a € A, the following hold: a) if 
oii = cij, then Ui{a) = Uj{a), and b) if Ui ^ aj, then there exists an action profile a' G A \ {a}, 
such that the following two conditions are satisfied: 

1. a' ; = ctj, cti = a'j and a' k = for all k ^ 

2. Ui(a') = Uj(a), Ui(a) = Uj(a') and Uk{a') = Mfc(a) for any k ^ i,j. 
Define the following equivalence relation between states in S: 

Definition 5.4 (State equivalence). For any two pure-strategy states s, s' G S such that s =/= s' , 
let a and a' denote the corresponding action profiles. We write s ~ s' if there exist i,j G I, i =/= j, 
such that the following two conditions are satisfied: 
1. a'; — aj, ai — a'j and a' k = ak for all k ^ 
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2. Ui(a') = Ujipi), Uj(a) = uAot 1 ) and Ufc(a') = Ufc(a) for any k =^ 

Since there is a one-to-one correspondence between iS and A, we also say that two action profiles 
a and a' are equivalent, if the conditions of Definition 15.41 are satisfied. 

Lemma 5.5. For any symmetric game and for any two pure-strategy states s, s' £ <S such that 
s — s', ir s = 7iv ■ 

Proof. Let us consider any two pure strategy states s, s' £ S such that s ~ s' . Let also consider 
any {s}-graph g. i.e., g € !?{s}. Such a graph can be identified as a collection of paths, i.e., for some 
M > 1, we have <? = Um=i ' where 

L(m)-1 

g,n = [J («K m (f) -> S K m (t+l)) 
t=\ 

for some L(m) > 1. In the above expression, the function n m provides an enumeration of the states 
that belong to the path g m . Note that due to the definition of C/{s}-graphs, we should have that 
s K m (L(m)) = s f° r all m = 1, . . . , M. Moreover, if M > 1, we should also have 

M 

("] { S « m (l)' ■ ' ■ ! S K,„(L(m)^l)} = ! 
m— 1 

i.e., the collection of paths {g m } do not cross each other, except at node s. 

Let us consider any other state s' € S such that s' ~ s. Since the game is symmetric, for any 
graph g £ G{s}, there exists a unique graph g' £ G{s'} which satisfies g' = Um=i 9m > where 

L(m)-1 

d'm = [J ( S K m (£) S K m (£+l)) 

and s Km (£) ~ /«, ^ = 1, . . . , L(m), for all m 6 {1, . . . , M}. 

The transition probability between any two states is a sum of probabilities of sequences of 
action profiles. Since the game is symmetric, for any such sequence of action profiles which leads, 
for instance, from s Km m to s Km (£+i), there exists an equivalent sequence of action profiles which 
leads from s' K to s' K w+iy Therefore, we should have that: 

for any m = 1,...,M, and hence, w(g') = w(g) . In other words, there exists an isomorphism 
between the graphs in the sets Q{s} and such that any two isomorphic graphs have the same 

transition probability. Thus, we have tt s = tt' s for any two states s, s' such that s ~ s' . □ 

Lemma [53] can be used to provide a more explicit characterization of the invariant distribution 
7r in several classes of coordination games which are also symmetric, e.g., common-pool games. 

5.3. Fairness in Common-Pool Games. First, recall that in common-pool games we define 
the set of "desirable" or "successful" action profiles A as in <\2A\i . To characterize more explicitly 
the invariant distribution 7r, we define the subset of pure-strategy states Si that correspond to 
"successful" states for agent i by 

Si = {s £ S : a, > ctj, Vj ^ i} . 
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In other words, Si corresponds to the set of pure-strategy states in which the action of agent i is 
strictly larger than the action of any other agent j ^ i. We also define S = Uiei^- 

Note that the equivalence relation ~ defines an isomorphism among the states of any two sets 
St and Sj for any i j. This is due to the fact that for any state Si G Si, there exists a unique state 
sj G Sj such that Si ~ Sj. 

Lemma 5.6. For any common-pool game, ir^ = ■ ■ ■ = tt§ . 

Proof. As already mentioned, for any i, j £ I such that i ^ j and for any state Sj G Si, there 
exists a unique state Sj G Sj such that Sj ~ s^. Therefore, the sets Si and Sj are isomorphic with 
respect to the equivalence relation Since a common-pool game is symmetric, from Lemma 15.51 
we conclude that tt$ = ■ ■ • = 715 . □ 

Theorem 5.7. Let §f be a common-pool game which satisfies hypothesis (HI) of Lemma \4-l\ 
There exists a constant Co > such that for any £ < Co, n§ ( > i, for all i G I . 

Proof. First, recognize that the sets {Si} are mutually disjoint, and |J i=1 Si = S . Then, by 
Theorem 14.41 and for any £ < i(A min — 5*), we have tt^ = X)"=i 7r 5 i 1 as e ^ . Lastly, by 
Lemma \5M the conclusion follows. □ 

In other words, we have shown that the invariant distribution tt puts equal weight on either 
agent "succeeding," which establishes a form of fairness over time. Moreover, it puts zero weight 
on states outside S (i.e., states which correspond to collisions) as e — > 0. 

5.4. Simulations in Common- Pool Games. Theorems 13.21 and 15.71 provide a characteriza- 
tion of the asymptotic behavior of aspiration learning in common-pool games as A and e approach 
zero. In fact, according to Remark 13. 1[ the expected percentage of time that the aspiration learning 
spends in any one of the pure strategy sets Si should be equal as the perturbation probability A — > 
and t — > 00 (i.e., fairness is established). Moreover, the expected percentage of "failures" (i.e., states 
outside S) approaches zero as t —> 00. 

We consider the following setup for aspiration learning: A = 0.001, e = 0.001, h = 0.01, 
c = 0.05, and £ = 0.05 . Also, we consider a common-pool game of 2 players and 4 actions, where 
co = 0, c\ = 0.1, C2 = 0.2, C3 = 0.3 and to = t\ = T2 = T3 = 0.8. Note that the maximum payoff 
difference within the same action profile is 5* = 0.1, and the minimum payoff difference between 
A and _4\ A is A m j n = 0.6. Therefore, the hypotheses of Theorem 15.71 are clearly satisfied since 
5* < A m ; n and £ < ^(A m i„ — 5*). Under this setup, Fig. [5J] demonstrates the response of aspiration 
learning. We observe, as Theorem 15 . 71 predicts . that the frequency with which either agent succeeds 
approaches 1 /2 as time increases. Also, the frequency of collisions (i.e., the joint actions in which 
neither agent succeeds) approaches zero as time increases. 

6. Conclusions. We introduced an aspiration learning algorithm and analyzed its asymptotic 
behavior in games of multiple players and actions. The main contribution of this analysis was the 
establishment of a relation between the time average behavior of the induced infinite-state Markov 
chain with the invariant distribution of a finite-state Markov chain. The establishment of this 
relation allowed for characterizing the asymptotic properties of aspiration learning when applied to 
generic coordination games. In particular, we showed that over time, the efficient payoff profiles are 
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Fig. 5.1. A typical response of aspiration learning in a common-pool game with 2 players and 4 actions. 



played (in expectation) with a frequency that can become arbitrarily large. This analysis extended 
(and corrected) prior results on aspiration learning which primarily focused on games of two players 
and two actions. We further demonstrated these results through simulations on network formation 
games, where distributed convergence to efficient networks is of particular interest. Finally, we 
provided conditions under which fair outcomes can be established in symmetric coordination games 
where coincidence of interest among players is not so strong. For example, we showed that in 
common-pool games, where multiple players compete over utilizing a limited resource, the expected 
frequency at which the common resource is exploited successfully is equally divided among players 
as time increases, which establishes a form of fairness. 
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